fix: correct CPU usage graph pinned at 100% by TerrifiedBug · Pull Request #14 · TerrifiedBug/vectorflow

TerrifiedBug · 2026-03-05T17:48:28Z

Summary

CPU usage graph was always pinned at 100% due to incorrect calculation
Vector's host_cpu_seconds_total is a per-core, per-mode counter — summing all modes (including idle) across all cores meant the delta always exceeded wall-clock time
Fix: track idle CPU seconds separately and compute CPU% = (total - idle) / total * 100

Changes

Full-stack fix across 7 files:

Agent scraper: parse mode label from host_cpu_seconds_total, sum idle+iowait into new CpuSecondsIdle field
Agent structs + heartbeat: add CpuSecondsIdle to HostMetrics struct and heartbeat builder
Server: accept cpuSecondsIdle in heartbeat Zod schema, store in DB
Prisma: add cpuSecondsIdle Float @default(0) column to NodeMetric + migration
Frontend: replace (cpuDelta / dtSeconds) * 100 with ((totalDelta - idleDelta) / totalDelta) * 100

Test plan

Deploy updated agent and server
Verify CPU graph shows realistic utilization (not pinned at 100%)
Verify existing metrics (memory, disk, network) are unaffected
Confirm backward compat: old agents without cpuSecondsIdle default to 0 (CPU shows 100% until agent updates, same as before)

The CPU graph was pinned at 100% because host_cpu_seconds_total from Vector is a per-core, per-mode counter. Summing all modes (including idle) across all cores meant the delta always exceeded wall-clock time, so (delta/dt)*100 was always >100% and got clamped. Fix: track idle CPU seconds separately and compute utilization as (total - idle) / total * 100, which is core-count independent and gives accurate whole-server CPU utilization. Changes across the full stack: - Agent scraper: filter by mode label, sum idle+iowait separately - Agent structs/heartbeat: add CpuSecondsIdle field - Server heartbeat route: accept and store cpuSecondsIdle - Prisma schema + migration: add cpuSecondsIdle column - Fleet router: return new field - Frontend chart: new formula using idle delta

greptile-apps · 2026-03-05T17:50:42Z

Greptile Summary

This PR fixes a CPU usage graph that was permanently pinned at 100% by introducing a separate cpuSecondsIdle counter (idle + iowait modes) throughout the full stack — agent scraper, Go structs, heartbeat payload, Zod schema, Prisma model + migration, tRPC select, and frontend chart — and replacing the old wall-clock-time formula with (totalDelta - idleDelta) / totalDelta * 100.

Root cause fix is correct: the old approach divided accumulated CPU-seconds (all cores × all modes) by elapsed wall-clock seconds, which always exceeded 1.0 × 100% for multi-core hosts. The new ratio-based formula is the standard way to compute CPU utilization from host_cpu_seconds_total counters.
iowait is grouped into the idle bucket (scraper.go lines 150-152). This is a semantic trade-off: I/O-bound workloads will report lower CPU%, which could mask disk pressure. Consider whether surfacing iowait separately or documenting the choice is warranted.
Backward compatibility is handled correctly: old agents without cpuSecondsIdle send 0, the DB column defaults to 0, and the frontend clamps the result — reproducing the prior 100% display until agents update, as documented in the PR.
Migration is safe: NOT NULL DEFAULT 0 on a DOUBLE PRECISION column is a non-destructive, backward-compatible change for existing rows.
Minor indentation error in heartbeat.go: the new CpuSecondsIdle field is indented one tab level shallower than all surrounding fields; gofmt would flag this.

Confidence Score: 4/5

Safe to merge — the core formula change is mathematically correct and the full-stack propagation is consistent.
The fix correctly addresses the root cause, all layers are updated consistently, backward compatibility is maintained, and the migration is non-destructive. Score is 4 rather than 5 only because of the iowait semantic ambiguity and the minor gofmt indentation issue in heartbeat.go.
agent/internal/metrics/scraper.go (iowait classification) and agent/internal/agent/heartbeat.go (indentation)

Important Files Changed

Filename	Overview
agent/internal/metrics/scraper.go	Correctly accumulates CpuSecondsIdle from idle+iowait mode labels; minor semantic concern about iowait classification.
agent/internal/agent/heartbeat.go	Correctly wires CpuSecondsIdle into the heartbeat payload; new line has a tab-level indentation error.
src/app/api/agent/heartbeat/route.ts	cpuSecondsIdle added to Zod schema as optional and persisted with correct null-default fallback; no auth or validation regressions.
src/components/fleet/node-metrics-charts.tsx	CPU% formula correctly changed to (totalDelta - idleDelta) / totalDelta; clamp guards and the i>0 check are preserved.
prisma/migrations/20260305100000_add_cpu_seconds_idle/migration.sql	Non-destructive ALTER TABLE with NOT NULL DEFAULT 0; backward-compatible for existing rows.

Sequence Diagram

sequenceDiagram
    participant V as Vector (Prometheus)
    participant S as scraper.go
    participant H as heartbeat.go
    participant API as /api/agent/heartbeat
    participant DB as PostgreSQL (NodeMetric)
    participant FE as node-metrics-charts.tsx

    V->>S: host_cpu_seconds_total{mode="idle"} += x
    V->>S: host_cpu_seconds_total{mode="iowait"} += y
    V->>S: host_cpu_seconds_total{mode="user"} += z
    note over S: CpuSecondsTotal += all modes<br/>CpuSecondsIdle += idle + iowait
    S->>H: HostMetrics{CpuSecondsTotal, CpuSecondsIdle}
    H->>API: POST heartbeat {cpuSecondsTotal, cpuSecondsIdle}
    API->>API: Zod validate (both optional)
    API->>DB: NodeMetric.create({cpuSecondsTotal, cpuSecondsIdle})
    FE->>DB: fleet.nodeMetrics query (cpuSecondsTotal, cpuSecondsIdle selected)
    DB-->>FE: time-series rows
    note over FE: cpuPercent = (totalDelta - idleDelta) / totalDelta * 100<br/>clamped to [0, 100]

_{Last reviewed commit: 1663a49}

greptile-apps · 2026-03-05T17:50:45Z

agent/internal/agent/heartbeat.go

 					MemoryUsedBytes:  sr.Host.MemoryUsedBytes,
 					MemoryFreeBytes:  sr.Host.MemoryFreeBytes,
 					CpuSecondsTotal:  sr.Host.CpuSecondsTotal,
+				CpuSecondsIdle:   sr.Host.CpuSecondsIdle,


Misaligned indentation on new field

The CpuSecondsIdle line uses one fewer tab than every other field in the same struct literal. Running gofmt would flag this. While it compiles fine, it breaks visual alignment and will cause noisy diffs in future edits.

Suggested change

CpuSecondsIdle: sr.Host.CpuSecondsIdle,

CpuSecondsIdle: sr.Host.CpuSecondsIdle,

Prompt To Fix With AI

This is a comment left during a code review. Path: agent/internal/agent/heartbeat.go Line: 63 Comment: **Misaligned indentation on new field** The `CpuSecondsIdle` line uses one fewer tab than every other field in the same struct literal. Running `gofmt` would flag this. While it compiles fine, it breaks visual alignment and will cause noisy diffs in future edits. ```suggestion CpuSecondsIdle: sr.Host.CpuSecondsIdle, ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-05T17:50:46Z

agent/internal/metrics/scraper.go

+			if mode == "idle" || mode == "iowait" {
+				sr.Host.CpuSecondsIdle += value
+			}


Consider excluding iowait from the idle bucket

Classifying iowait as "idle" means I/O-bound workloads will show artificially low CPU utilization on the graph. For example, a system that's 80% blocked on disk reads will report ~20% CPU busy even though it's clearly under stress.

Standard tools like iostat and htop report iowait as a separate category precisely to make I/O pressure visible. The field is also named CpuSecondsIdle, which implies pure idle time.

If the intent is "CPU not doing compute work", renaming the field to CpuSecondsNonBusy (and documenting that it includes iowait) would at least make the semantics explicit. Alternatively, tracking idle only and displaying iowait as a separate series in the chart gives users richer diagnostic information.

Prompt To Fix With AI

This is a comment left during a code review. Path: agent/internal/metrics/scraper.go Line: 150-152 Comment: **Consider excluding `iowait` from the idle bucket** Classifying `iowait` as "idle" means I/O-bound workloads will show artificially low CPU utilization on the graph. For example, a system that's 80% blocked on disk reads will report ~20% CPU busy even though it's clearly under stress. Standard tools like `iostat` and `htop` report `iowait` as a separate category precisely to make I/O pressure visible. The field is also named `CpuSecondsIdle`, which implies pure idle time. If the intent is "CPU not doing compute work", renaming the field to `CpuSecondsNonBusy` (and documenting that it includes iowait) would at least make the semantics explicit. Alternatively, tracking `idle` only and displaying `iowait` as a separate series in the chart gives users richer diagnostic information. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps bot reviewed Mar 5, 2026

View reviewed changes

TerrifiedBug merged commit e559235 into main Mar 5, 2026
1 check passed

TerrifiedBug deleted the fix/cpu-metrics-calculation branch March 5, 2026 17:51

github-actions bot added agent fix labels Mar 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: correct CPU usage graph pinned at 100%#14

fix: correct CPU usage graph pinned at 100%#14
TerrifiedBug merged 1 commit intomainfrom
fix/cpu-metrics-calculation

TerrifiedBug commented Mar 5, 2026

Uh oh!

greptile-apps bot commented Mar 5, 2026

Uh oh!

greptile-apps bot Mar 5, 2026

Uh oh!

greptile-apps bot Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	CpuSecondsIdle: sr.Host.CpuSecondsIdle,
	CpuSecondsIdle: sr.Host.CpuSecondsIdle,

Conversation

TerrifiedBug commented Mar 5, 2026

Summary

Changes

Test plan

Uh oh!

greptile-apps bot commented Mar 5, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant